Fix #933: raise compaction memory_limit to 4 GB by erikdarlingdata · Pull Request #952 · erikdarlingdata/PerformanceMonitor

erikdarlingdata · 2026-05-12T04:25:45Z

Summary

Raises memory_limit for parquet compaction from 1GB → 4GB in Lite/Services/ArchiveService.cs (both single-pass and pair-merge code paths)
Updates the now-stale comment about spilling to describe what actually happens
Extends tools/CompactionRepro with stress-test scaffolding for future regressions

Why

#942 dropped the cap to 1 GB on the theory that a tighter limit + temp_directory would force earlier spilling and keep peak working set down. That validation ran on query_stats (narrow, ~1.7M rows). The reporter's failure is on query_snapshots, which carries query_text + query_plan + live_query_plan per row. Nightly logs (#933) show OOM at 906/953 MiB used with the 1 GB cap.

Standalone reproducer pinpoints the cause: parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer manager and can't be spilled. The cap acts as a hard ceiling for those, not a spill trigger. Spill on disk = 0 MB across every configuration tested (memory_limit 1/2/4 GB; accumulator vs tournament merge; threads 1 vs 2; :memory: vs file-backed DB). Same failure reproduces in standalone DuckDB CLI v1.5.2 — engine issue, not a binding artifact. Upstream: duckdb#16482, duckdb#10084.

DuckDB's OOM guide explicitly warns about this and recommends memory_limit at 50-60% of system RAM. 4 GB sits well inside that range and leaves real headroom on top of the un-spillable allocations.

Reporter's actual file sizes (provided in #933): 15-25 chunks of 2-6 MB plus a 35-45 MB monthly file per group, totaling ~97-143 MB per merge. Comfortably below where 4 GB has any trouble.

Test plan

dotnet build Lite/PerformanceMonitorLite.csproj -c Release — clean
Reproducer at tools/CompactionRepro succeeds at 4 GB on synthetic data shaped like query_snapshots:
- 200 MB / 15 chunks: peak WS 129 MB
- 1.5 GB / 15 chunks: peak WS 396 MB
Reproducer fails at 1 GB on the same data (matches reporter's nightly error string)
Reporter confirms next nightly resolves the OOM on their 4-server setup
Upstream comment posted to duckdb#16482 with the synthetic reproducer

Notes

tools/CompactionRepro gained --strategy {accumulator|tournament}, --db-mode {memory|file}, --merge-files, --synthetic (with --synthetic-rows and --synthetic-plan-kb), and --cycles for leak testing. Kept so a future regression in this area is easy to reproduce.

🤖 Generated with Claude Code

#942 lowered the cap to 1 GB on the theory that a tight memory_limit plus temp_directory would force DuckDB to spill earlier and keep peak working set down. That validation ran against query_stats (narrow, ~1.7M rows) and showed peak 1236 MB → 166 MB. The reporter's actual failure is on query_snapshots, which carries query_text + query_plan + live_query_plan per row. With the 1 GB cap, the nightly logs show OOM at "906/953 MiB used" before any merge progress. The standalone reproducer (tools/CompactionRepro) confirms the cause: parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer manager and can't be spilled. The cap acts as a hard ceiling for those, not a spill trigger. Spill on disk = 0 MB across every configuration we tested (memory_limit 1/2/4 GB, accumulator vs tournament merge, threads 1 vs 2, :memory: vs file-backed DB). The same failure reproduces in standalone DuckDB CLI v1.5.2, so it's an engine issue — see upstream issues duckdb#16482 and discussion#10084. DuckDB's own OOM guide explicitly warns about this case and recommends memory_limit at 50-60% of system RAM, not a tight cap. 4 GB sits well inside that range for typical workstation/server hosts and leaves real headroom on top of the un-spillable allocations. Reporter's actual file sizes (15-25 chunks of 2-6 MB plus a 35-45 MB monthly file per group) are well below the level where 4 GB has any trouble. The reproducer confirms 4 GB succeeds on a synthetic query_snapshots-shaped dataset of ~1.5 GB with peak working set of ~400 MB; the reporter's data is ~143 MB at worst. Also updates the stale comment about spilling — temp_directory was set per #935 but the buffer-manager-bypassing allocations don't use it. The comment now describes what actually happens. The tools/CompactionRepro changes add --strategy {accumulator|tournament}, --db-mode {memory|file}, --merge-files, --synthetic data generation, and --cycles for leak testing. These are kept so a future regression in this area can be reproduced and diagnosed quickly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nsiently for COPY erikdarlingdata#933's titled complaint is "Memory usage on client": Lite holds ~2.7-2.9 GB after 10 minutes with 4 servers. The compaction OOMs everyone has been chasing in this thread are a downstream symptom — by the time compaction runs the app already holds 2.7 GB, leaving little headroom on the reporter's 16 GB / ~1.6 GB-free machine. Root cause: the main DuckDB ConnectionString set no memory_limit, so the buffer pool ran at the DuckDB default of 80% of system RAM (~12.8 GB on a 16 GB box). With archive parquet files accumulating on disk, every UI query over an archive view caches pages and the buffer pool grows freely. The fix has to navigate one wrinkle: parquet COPY in DuckDB v1.5.2 hits a buffer-manager-bypass pre-reservation that needs ~2-4 GB headroom. Capping the main connection at 1 GB statically would break ExportToParquet and the two COPY paths in ArchiveAllAndResetAsync. So: - ConnectionString: memory_limit=1GB (caps resting buffer pool — addresses the actual complaint by stopping the archive-page cache from growing unbounded). - Around each parquet COPY on the main connection: SET memory_limit='4GB', run the COPY, SET back to '1GB'. Factored into a WithRaisedCopyMemoryLimit helper so the three call sites stay consistent (ExportToParquet, and the two COPYs in ArchiveAllAndResetAsync). - Compaction connections (separate :memory: instances) keep their 4 GB cap from erikdarlingdata#952. Verified against DuckDB CLI v1.5.2 with synthetic query_snapshots-shaped data: - COPY table→parquet at 256MB/512MB/1GB: OOMs (pre-reservation, matches the read_parquet→parquet path we saw in erikdarlingdata#952 testing). - COPY table→parquet at 2GB/4GB: succeeds, peak RSS well under cap. - INSERT (Appender) and SELECT (including GROUP BY across 11k rows) work fine at 256MB cap — confirms collectors and UI queries don't have the pre-reservation behavior and aren't affected by the resting cap. Tradeoff: the resting cap forces buffer-pool eviction of cached archive parquet pages. Long-range historical UI queries that re-scan many parquet files will do more disk I/O. Live/recent-data queries against the hot DB are unaffected (hot DB is small enough to fit in 1 GB easily). Plus the per-merge-step BuildSelectClause from the previous commit fixes the separate query_store_stats Binder Error on archives that span the v13 schema change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erikdarlingdata mentioned this pull request May 12, 2026

[BUG] Memory usage on client #933

Closed

erikdarlingdata merged commit 06db2d7 into dev May 12, 2026
2 checks passed

erikdarlingdata deleted the feature/933-bump-compaction-memory-limit branch May 12, 2026 04:49

erikdarlingdata mentioned this pull request May 15, 2026

Fix #933: cap main DuckDB memory_limit, per-pair compaction exclude detection #955

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #933: raise compaction memory_limit to 4 GB#952

Fix #933: raise compaction memory_limit to 4 GB#952
erikdarlingdata merged 1 commit into
devfrom
feature/933-bump-compaction-memory-limit

erikdarlingdata commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erikdarlingdata commented May 12, 2026

Summary

Why

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant